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Epair = SEEy (8a) 

where: 

oc, forrjj<3 

E 1 ^, for3i;ij,<Rj 

eg, forR^Sr^R^ 



E ij ^ „ r> rep ^ .r> (8b) 



0, for Rij < nj 

where €y are the pair-wise interaction parameters, 6,26 and the interactions are 
counted for all pairs, except the first nearest neighbors along the chain. A strong 
soft-core repulsive energy of about 4kT can be used in the simulations. This term 
provides a lightly larger excluded volume for larger amino acids than that defined by 
the hard core. The values of the cut-off distances Ry rep and Ry are given in Table I, 
below. The values of Ry were adjusted to approximately mimic the contact 
distances employed in the derivation of binary interactions parameters. Here, a 
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"native" interaction scale as described by Skolnick, et ah 



TABLE L Compilation of Pairwise Cut-off Distances 



in Angstroms 



Ai 


Aj 


*r 


Rij 

(attractive) 3 


R« 

(repulsive) 


Small b 


Small 


4.35 


7.03 


6.32 


Small 


Large 


4.57 


7.03 


6.32 


Large 
a a ^ jj • 


Large 


4.83 


7.50 


7.03 



a Attractive pair of amino acids. 



b Small amino acids are: Gly, Ala, Ser, Cys, Val, Thr, Pro. 



One-body burial interactions 

To facilitate a rapid collapse of the model chain, a centro-symmetric, density 
regularizing term was used that is based on a statistical analysis of single domain 
proteins. This is the only term that uses the assumption that fee target protein has a 
single domain. For some increase in computational cost, this term could be omitted. 
The radius of gyration of the protein is given by: 

SD- 144976.1 
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S^CN^frcM-rO 2 ) 172 (9) 

where tcm is the position of the center of mass of the globule, and r* is the 
position of the center of mass of the i-th side chain. The size of a single domain 
protein is strongly correlated with the number of residues, N, comprising the protein, 
in accordance with: 

S = 1 .52 N 0 ' 38 in lattice units. (10) 

The exponent 0.38, obtained from the statistical analysis of single domain 
globular proteins, 21 is very close to the value of 1/3 expected for a long, collapsed 

23 

polymer chain. 22 The corresponding potential has the following form: 

E b = Sb £|mo.i - mi| ( n ) 

where mo i is the target number of amino acids in a given spherical shell centered at 
the protein's center of mass. There are three equal thickness shells within a distance 
S, and they contain somewhat more than half of the protein residues. The entire 
protein is essentially contained in a sphere of radius equal to 5/3 S. The value of the 
parameter e b was equal to 0.25-1 .0 k B T, depending on protein size. Larger proteins 
tend to exhibit a larger absolute deviation from the above target distribution of mass, 
and consequently, a lower penalty for such deviations should be employed. 

To further enhance rapid collapse, those residues that are within a radius of 
2/3 S (a very conservative estimate of the hydrophobic core of a single domain 
globular protein) contribute bkd&^ to the total energy, where skd® is the Kyte- 
Doolittle hydrophobicity parameter of the i-th residue. 19,24 The scaling factor 1/16 is 
preferred. This potential (and its scaling with respect to other interactions) has very 
little effect on the folded structure, but it improves folding kinetics. 

Multibody surface exposure term 

Amino acid side groups have a different size and shape. Thus, when a given 
side chain is in contact with another amino acid, the fraction of its surface that is 
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covered depends on the identity of the contacting partner. Appropriate parameters 
reflecting this observation (i.e., the surface coverage of particular types of side 
chains and associated statistical-type potential) could be derived from the statistics 
of known protein structures. In the present algorithm, each residue can have 30 
surface contact points. A subset of these contact points becomes occupied upon 
contact with other side chains or main chain Ca atoms. The Ca atom positions are 
approximated from the positions of three consecutive side chain beads and have 
their own excluded volume and contribution to surface coverage. Due to 
"shadowing," i.e., one residue being covered by another, some contact points could 
be multiply occupied by different residues (usually 1 or 2, or sometimes 3, but very 
rarely 4 or more). The fraction of occupied surface points defines the fraction of 
buried area of a given side chain. The total energy of a model protein is computed 
as: 

Esurface = ^ E b (Ai, aj) C 1 2 ) 

where a, is the covered fraction of sites of amino acid side chain Ai and E b ( Aj, a{) is 
the statistical potential for amino acids Ai that are covered by a t contact points, i.e., 
its coverage fraction is a/30, when the number of contact points is 30. The reference 
state for this statistical potential is "an average" amino acid with average (over 
structural database) coverage. One scaling factor s s for this term has been 
determined to be 0.25, although other scaling can be used. 

The above approach to the hydrophobic interactions allows suppression of 
previously employed centro-symmetric one-body potentials 6 and thereby opens up 
the present approach to multi-domain and multi-meric proteins. In this example, 
both models of mean field hydrophobic interactions were used in parallel. 

The force field designed for this model is entirely of a "knowledge-based" 
origin. Some terms, such as the generic short- and long-range potentials, provide a 
bias toward protein-like short- and long-range correlations in the model chain. 
These potentials generalize regularities seen in native structures of all globular 
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